Goto

Collaborating Authors

 natural language data


A distributional simplicity bias in the learning dynamics of transformers

Neural Information Processing Systems

The remarkable capability of over-parameterised neural networks to generalise effectively has been explained by invoking a ``simplicity bias'': neural networks prevent overfitting by initially learning simple classifiers before progressing to more complex, non-linear functions. While simplicity biases have been described theoretically and experimentally in feed-forward networks for supervised learning, the extent to which they also explain the remarkable success of transformers trained with self-supervised techniques remains unclear. In our study, we demonstrate that transformers, trained on natural language data, also display a simplicity bias. Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions while continuing to learn high-degree interactions. To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which rigorously capture the interactions between tokens up to a specified order. This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.


A distributional simplicity bias in the learning dynamics of transformers

Neural Information Processing Systems

The remarkable capability of over-parameterised neural networks to generalise effectively has been explained by invoking a simplicity bias'': neural networks prevent overfitting by initially learning simple classifiers before progressing to more complex, non-linear functions. While simplicity biases have been described theoretically and experimentally in feed-forward networks for supervised learning, the extent to which they also explain the remarkable success of transformers trained with self-supervised techniques remains unclear. In our study, we demonstrate that transformers, trained on natural language data, also display a simplicity bias. Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions while continuing to learn high-degree interactions. To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which rigorously capture the interactions between tokens up to a specified order.


TinyLlama: An Open-Source Small Language Model

arXiv.org Artificial Intelligence

Building on the architecture and tokenizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention (Dao, 2023)), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes.


Positional Description Matters for Transformers Arithmetic

arXiv.org Artificial Intelligence

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is their naive reliance on positional information to solve arithmetic problems with a small number of digits, leading to poor performance on larger numbers. Herein, we delve deeper into the role of positional encoding, and propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently. We investigate the value of these modifications for three tasks: (i) classical multiplication, (ii) length extrapolation in addition, and (iii) addition in natural language context. For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication. In the experiments on addition, we use a mere 120k samples to demonstrate: for (ii) extrapolation from 10 digits to testing on 12 digits numbers while usual training would have no extrapolation, and for (iii) almost perfect accuracy up to 5 digits while usual training would be correct only up to 3 digits (which is essentially memorization with a training set of 120k samples).


Learn NLP the Stanford way -- Lesson 1

#artificialintelligence

The AI area of Natural Language Processing, or NLP, throughout its gigantic language models -- yes, GPT-3, I'm watching you -- presents what it's perceived as a revolution in machines' capabilities to perform the most distinct language tasks. Due to that, the perception of the public as a whole is split: some perceive that these new language models are going to pave the way to a Skynet type of technology, while others dismiss them as hype-fueled technologies that will live in dusty shelves, or HDD drives, in little to no time. Motivated by this, I'm creating this series of stories that will approach NLP from scratch in a friendly way. To join me, you'll need to have little experience with Python and Jupyter Notebooks, and for the most part, I won't even ask you to have anything installed on your machine. This series will differ dramatically from the Stanford course in terms of the depth that we'll approach statistics and calculus.


AI 2020: What lies ahead for natural language data

#artificialintelligence

Natural language technology has fueled a boom in AI adoption, as everyone from small businesses to large corporations seek to introduce streamlined, automated language functions into their customer service and back-end systems. But it's also an area of confusion, owing to plenty of hype--and industries need to get through this confusion in order to bring the sophisticated natural language solutions of tomorrow to fruition. To gain a better understanding of what natural language AI will look like in 2020, we sat down with Alex Poulis. Alex is the senior director of AI at Transperfect, where he founded their Dataforce division, which focuses on training data for machine learning. He's been involved in language technologies since 2002--long before the world entered its current AI hype cycle--and previously worked with Lionbridge on their data collection efforts.


Kyobo Life Insurance rolls out AI-based underwriting system - The Digital Insurer

#artificialintelligence

In Korea, Kyobo Life has announced the launch of its new AI-based underwriting platform called Best Analysis and Rapid Outcome (BARO). The platform employs machine learning technology with the ability to process large amounts of natural language data. Kyobo life's AI-based underwriting platform employs machine learning technology and has the ability to process large amounts of natural language data The platform provides real-time services to sales staff and customers. The platform leverages Kyobo Life's underwriting manual to facilitate online deliveries by enabling instant communication with its sales staff. BARO's intelligence allows for easy approval or denial of insurance contracts with the help of screening criteria for pre-existing conditions and medical history.


Text Clustering : Get quick insights from Unstructured Data

@machinelearnbot

In this two-part series, we will explore text clustering and how to get insights from unstructured data. It will be quite powerful and industrial strength. The first part will focus on the motivation. The second part will be about implementation. This post is the first part of the two-part series on how to get insights from unstructured data using text clustering.


Text Clustering: Get quick insights from Unstructured Data

@machinelearnbot

In this two-part series, we will explore text clustering and how to get insights from unstructured data. It will be quite powerful and industrial strength. The first part will focus on the motivation. The second part will be about implementation. This post is the first part of the two-part series on how to get insights from unstructured data using text clustering.


How Natural Language Processing can Revolutionize Human Resources - Analytics in HR

#artificialintelligence

Natural language processing is an ever-growing interest area in the analytics application spectrum and is relevant to HR. In fact, it can revolutionize the quality of insights. In this article, we will explain you how. Did you know that text analysis has been the most prevalent productivity tool over the past 3 decades or so for HR? It is very familiar to HR. HR has been using Boolean keyword searches for identifying good resumes/ job applications for a long time already.